Linking article parts for the creation of newspaper digital library
نویسندگان
چکیده
An important issue pertaining to the retro-conversion of newspapers, i.e. the conversion of newspaper issues into digital resources, is the identification and appropriate digital representation of an article. To complete this task, a number of steps have to be followed, from segmentation of the newspaper image to optical character recognition and linking of different items belonging to the same article. In this paper, an evaluation of different information retrieval techniques is presented that aim at linking textual parts of an article that can be found on different pages of a newspaper issue. Three document matching techniques are evaluated, namely title-to-title, title-to-text and text-to-text matching. In addition, the effect on the matching accuracy of using a stemmer and of employing appropriate conflict resolution techniques is studied for each of the above approaches. Experimental results involving a number of issues of a Greek newspaper show that the best technique, namely text-to-text matching augmented with a stemmer and conflict resolution, can reach a high linking accuracy rate of 96%.
منابع مشابه
بررسی میزان رعایت معیارهای مدیریت دانش در وبسایتهای کتابخانههای دیجیتالی منتخب در ایران
Background and Aim: Considering the elements of knowledge management (availability, creation, and transfer of knowledge) is very important in digital libraries websites and makes the performance better. So this paper aim to identify the knowledge management criteria in Iranian selected digital library's websites and study of observance scale Materials and Methods: The research method was des...
متن کاملAn integrated system for creating a Digital Library from Newspaper Archives
Newspapers are considered to be the first draft of history, while at the same time, are part of a country’s cultural heritage. By converting newspaper archives to digital resources we achieve digital preservation in terms of preventing paper deterioration as well as providing full utilization of the archives by all interested parties. In this paper, we present a series of applications pertainin...
متن کاملLinking Historical Ship Records to a Newspaper Archive
Linking historical datasets and making them available on the Web has increasingly become a subject of research in the field of digital humanities. In this paper, we focus on discovering links between ships from a dataset of Dutch maritime events and a historical archive of newspaper articles. We apply a heuristic-based method for finding and filtering links between ship instances; subsequently,...
متن کاملInvestigating the Level of Observing the Evaluation Criteria for User Interface in library services providing to the blind and deaf users in the word
Purpose: Digital library user interfaces has a determining role in desirable performance of this kind of libraries. Digital Library service providers to the blind and deaf users will have their best performance when the users (deaf and blind users) could have a proper interaction with them. This study aims to evaluate and analyze the criteria related to user interface in digital libraries servi...
متن کاملA Superimposed Information-Supported Digital Library
Selecting and annotating multimedia information at varying document granularities – parts of a document, to a complete document, to multiple documents Linking new content with existing content, at varying document granularities Organizing/arranging annotated information. Sharing and reusing of new information (annotations, structures, etc) and associated existing information (for example, seein...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000